Search CORE

87 research outputs found

Fast and Compact Regular Expression Matching

Author: Bille Philip
Farach-Colton Martin
Publication venue
Publication date: 01/01/2008
Field of study

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The IT University of Copenhagen's Repository

Dictionaries Revisited

Author: Farach-Colton Martin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 16th International Symposium on Experimental Algorithms (SEA 2017)
Publication date: 01/01/2017
Field of study

Dictionaries are probably the most well studied class of data structures. A dictionary supports insertions, deletions, membership queries, and usually successor, predecessor, and extract-min. Given their centrality to both the theory and practice of data structures, surprisingly basic questions about them remain unsolved and sometimes even unposed. This talk focuses on questions that arise from the disparity between the way large-scale dictionaries are analyzed and the way they are used in practice

Dagstuhl Research Online Publication Server

Insertion Sort is O(n log n)

Author: Bender Michael A.
Farach-Colton Martin
Mosteiro Miguel
Publication venue
Publication date: 01/01/2004
Field of study

Traditional Insertion Sort runs in O(n^2) time because each insertion takes O(n) time. When people run Insertion Sort in the physical world, they leave gaps between items to accelerate insertions. Gaps help in computers as well. This paper shows that Gapped Insertion Sort has insertion times of O(log n) with high probability, yielding a total running time of O(n log n) with high probability.Comment: 6 pages, Latex. In Proceedings of the Third International Conference on Fun With Algorithms, FUN 200

arXiv.org e-Print Archive

CiteSeerX

Tight Bounds for Monotone Minimal Perfect Hashing

Author: Assadi Sepehr
Farach-Colton Martin
Kuszmaul William
Publication venue
Publication date: 25/07/2022
Field of study

The monotone minimal perfect hash function (MMPHF) problem is the following indexing problem. Given a set

S= \{s_1,\ldots,s_n\}

n

distinct keys from a universe

U

of size

u

, create a data structure

DS

that answers the following query:

RankOp(q) = \text{rank of } q \text{ in } S \text{ for all } q\in S ~\text{ and arbitrary answer otherwise.}

Solutions to the MMPHF problem are in widespread use in both theory and practice. The best upper bound known for the problem encodes

DS

O(n\log\log\log u)

bits and performs queries in

O(\log u)

time. It has been an open problem to either improve the space upper bound or to show that this somewhat odd looking bound is tight. In this paper, we show the latter: specifically that any data structure (deterministic or randomized) for monotone minimal perfect hashing of any collection of

n

elements from a universe of size

u

requires

\Omega(n \cdot \log\log\log{u})

expected bits to answer every query correctly. We achieve our lower bound by defining a graph

\mathbf{G}

where the nodes are the possible

{u \choose n}

inputs and where two nodes are adjacent if they cannot share the same

DS

. The size of

DS

is then lower bounded by the log of the chromatic number of

\mathbf{G}

. Finally, we show that the fractional chromatic number (and hence the chromatic number) of

\mathbf{G}

is lower bounded by

2^{\Omega(n \log\log\log u)}

arXiv.org e-Print Archive

GPU LSM: A Dynamic Dictionary Data Structure for the GPU

Author: Amenta Nina
Ashkiani Saman
Farach-Colton Martin
Li Shengren
Owens John D.
Publication venue
Publication date: 01/01/2018
Field of study

We develop a dynamic dictionary data structure for the GPU, supporting fast insertions and deletions, based on the Log Structured Merge tree (LSM). Our implementation on an NVIDIA K40c GPU has an average update (insertion or deletion) rate of 225 M elements/s, 13.5x faster than merging items into a sorted array. The GPU LSM supports the retrieval operations of lookup, count, and range query operations with an average rate of 75 M, 32 M and 23 M queries/s respectively. The trade-off for the dynamic updates is that the sorted array is almost twice as fast on retrievals. We believe that our GPU LSM is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS'18

arXiv.org e-Print Archive

eScholarship - University of California

Improved Distortion and Spam Resistance for PageRank

Author: Farach-Colton Lucas
Farach-Colton Martin
Goldberg Leslie Ann
Lapinskas John
Levi Reut
Medina Moti
Mosteiro Miguel
Publication venue
Publication date: 04/11/2019
Field of study

For a directed graph

G = (V,E)

, a ranking function, such as PageRank, provides a way of mapping elements of

V

to non-negative real numbers so that nodes can be ordered. Brin and Page argued that the stationary distribution,

R(G)

, of a random walk on

G

is an effective ranking function for queries on an idealized web graph. However,

R(G)

is not defined for all

G

, and in particular, it is not defined for the real web graph. Thus, they introduced PageRank to approximate

R(G)

for graphs

G

with ergodic random walks while being defined on all graphs. PageRank is defined as a random walk on a graph, where with probability

(1-\epsilon)

, a random out-edge is traversed, and with \emph{reset probability}

\epsilon

the random walk instead restarts at a node selected using a \emph{reset vector}

\hat{r}

. Originally,

\hat{r}

was taken to be uniform on the nodes, and we call this version UPR. In this paper, we introduce graph-theoretic notions of quality for ranking functions, specifically \emph{distortion} and \emph{spam resistance}. We show that UPR has high distortion and low spam resistance and we show how to select an

\hat{r}

that yields low distortion and high spam resistance.Comment: 36 page

arXiv.org e-Print Archive

LIPIcs, Volume 274, ESA 2023, Complete Volume

Author: Farach-Colton Martin
Herman Grzegorz
Puglisi Simon J.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 274, ESA 2023, Complete Volum

Dagstuhl Research Online Publication Server

Preface

Author: Cryan Mary
Farach-Colton Martin
Publication venue: 'Elsevier BV'
Publication date: 01/08/2007
Field of study

Elsevier - Publisher Connector

Edinburgh Research Explorer